首页> 外文OA文献 >microTaboo : a general and practical solution to the k-disjoint problem
【2h】

microTaboo : a general and practical solution to the k-disjoint problem

机译:microTaboo:k不相交问题的通用且实用的解决方案

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Background: A common challenge in bioinformatics is to identify short sub-sequences that are unique in a set of genomes or reference sequences, which can efficiently be achieved by k-mer (k consecutive nucleotides) counting. However, there are several areas that would benefit from a more stringent definition of "unique", requiring that these sub-sequences of length W differ by more than k mismatches (i.e. a Hamming distance greater than k) from any other sub-sequence, which we term the k-disjoint problem. Examples include finding sequences unique to a pathogen for probe-based infection diagnostics; reducing off-target hits for re-sequencing or genome editing; detecting sequence (e.g. phage or viral) insertions; and multiple substitution mutations. Since both sensitivity and specificity are critical, an exhaustive, yet efficient solution is desirable. Results: We present microTaboo, a method that allows for efficient and extensive sequence mining of unique (k-disjoint) sequences of up to 100 nucleotides in length. On a number of simulated and real data sets ranging from microbe-to mammalian-size genomes, we show that microTaboo is able to efficiently find all sub-sequences of a specified length W that do not occur within a threshold of k mismatches in any other sub-sequence. We exemplify that microTaboo has many practical applications, including point substitution detection, sequence insertion detection, padlock probe target search, and candidate CRISPR target mining. Conclusions: microTaboo implements a solution to the k-disjoint problem in an alignment-and assembly free manner. microTaboo is available for Windows, Mac OS X, and Linux, running Java 7 and higher, under the GNU GPLv3 license, at:https://MohammedAlJaff.github.io/microTaboo
机译:背景:生物信息学中的一个共同挑战是要鉴定一组基因组或参考序列中独特的短亚序列,这可以通过k-mer(k个连续核苷酸)计数有效地实现。但是,有几个领域可以从更严格的“唯一”定义中受益,这要求长度为W的这些子序列与任何其他子序列相差超过k个不匹配(即汉明距离大于k),我们称其为k不相交的问题。实例包括发现病原体特有的序列以进行基于探针的感染诊断;减少脱靶命中,以进行重新测序或基因组编辑;检测序列(例如噬菌体或病毒)插入;和多个替换突变。由于敏感性和特异性都至关重要,因此需要一种详尽而有效的解决方案。结果:我们提出了microTaboo,该方法可有效且广泛地序列挖掘长度不超过100个核苷酸的独特(k不交织)序列。在从微生物到哺乳动物大小的基因组的许多模拟和真实数据集上,我们表明microTaboo能够有效地找到指定长度W的所有子序列,这些子序列不会出现在任何其他k个错配阈值之内子序列。我们举例说明microTaboo具有许多实际应用,包括点替换检测,序列插入检测,挂锁探针靶标搜索和候选CRISPR靶标挖掘。结论:microTaboo以无对齐和无装配的方式实现了k不相交问题的解决方案。 microTaboo在GNU GPLv3许可下可用于运行Java 7及更高版本的Windows,Mac OS X和Linux,网址为:https://MohammedAlJaff.github.io/microTaboo

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号